Pattern Mining Across Domain-Specific Text Collections
نویسندگان
چکیده
This paper discusses a consistency in patterns of language use across domain-specific collections of text. We present a method for the automatic identification of domain-specific keywords – specialist terms – based on comparing language use in scientific domain-specific text collections with language use in texts intended for a more general audience. The method supports automatic production of collocational networks, and of networks of concepts – thesauri, or so-called ontologies. The method involves a novel combination of existing metrics from work in computational linguistics, which can enable extraction, or learning, of these kinds of networks. Creation of ontologies or thesauri is informed by international (ISO) standards in terminology science, and the resulting resource can be used to support a variety of work, including data-mining applications.
منابع مشابه
Data Mining
Data Mining provides approaches for the identification and discovery of non-trivial patterns and models hidden in large collections of data. In the applied natural language processing domain, data mining usually requires preprocessed data that has been extracted from textual documents. Additionally, this data is often integrated with other data sources. This chapter provides an overview on data...
متن کاملCharacterizing Regulatory Documents and Guidelines Based on Text Mining
Implementing rules, constraints, and requirements contained in regulatory documents such as standards or guidelines constitutes a mandatory task for organizations and institutions across several domains. Due to the amount of domain-specific information and actions encoded in these documents, organizations often need to establish cooperations between several departments and consulting experts to...
متن کاملSystematic text-mining approach for deriving aspects and patterns from domain knowledge
As the theoretical underpinnings of aspect-orientation mature, its application across the software lifecycle has expanded. An active area of research focuses on the application of aspect oriented techniques to unstructured or semi-structured requirements documents. In this context, primary issues involve the identification of early aspects and various forms of aspectual manipulation (e.g., weav...
متن کاملCross-Domain Mining of Argumentative Text through Distant Supervision
Argumentation mining is considered as a key technology for future search engines and automated decision making. In such applications, argumentative text segments have to be mined from large and diverse document collections. However, most existing argumentation mining approaches tackle the classification of argumentativeness only for a few manually annotated documents from narrow domains and reg...
متن کاملText Data Mining with Optimized Pattern Discovery
This paper describes an application of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over phrases, called proximity phrase association patterns, and consider the problem of nding the patterns that optimizes a given statistical measure in a large collection of unstructured texts. For this class of patterns, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005